A New Approach to the Study of Translationese: Machine-learning the Difference between Original and Translated Text
نویسندگان
چکیده
In this paper we describe an approach to the identification of “translationese” based on monolingual comparable corpora and machine learning techniques for text categorization. The paper reports on experiments in which support vector machines (SVMs) are employed to recognize translated text in a corpus of Italian articles from the geopolitical domain. An ensemble of SVMs reaches 86.7% accuracy with 89.3% precision and 83.3% recall on this task. A preliminary analysis of the features used by the SVMs suggests that the distribution of function words and morphosyntactic categories in general, and personal pronouns and adverbs in particular, are among the cues used by the SVMs to perform the discrimination task. A follow-up experiment shows that the performance attained by SVMs is well above the average performance of ten human subjects, including five professional translators, on the same task. Our results offer solid evidence supporting the translationese hypothesis, and our method seems to have promising applications in translation studies and in quantitative style analysis in general. Implications for the machine learning/text categorization community are equally important, both because this is a novel application, and especially because we provide explicit evidence that a relatively knowledge-poor machine learning algorithm can outperform human beings in a text classification task.
منابع مشابه
Translationese Traits in Romanian Newspapers: A Machine Learning Approach
This paper presents a machine learning approach to the investigation of the translationese effect on Romanian newspapers texts. The aim is to train a learning system to distinguish between translated and non-translated texts. The classifiers achieve an accuracy well above the chance level, the results confirming the existence of translationese manifestation. Also, the experiments investigate wh...
متن کاملIdentification of Translationese: A Machine Learning Approach
This paper presents a machine learning approach to the study of translationese. The goal is to train a computer system to distinguish between translated and non-translated text, in order to determine the characteristic features that influence the classifiers. Several algorithms reach up to 97.62% success rate on a technical dataset. Moreover, the SVM classifier consistently reports a statistica...
متن کاملMetadiscourse Markers: A Contrastive Study of Translated and Non-Translated Persuasive Texts
Metadiscourse features are those facets of a text, which make the organization of the text explicit, provide information about the writer's attitude toward the text content, and engage the reader in the interaction. This study interpreted metadiscourse markers in translated and non-translated persuasive texts. To this end, the researcher chose the translated versions of one of the leading newsp...
متن کاملStatistical Machine Translation with Automatic Identification of Translationese
Translated texts (in any language) are so markedly different from original ones that text classification techniques can be used to tease them apart. Previous work has shown that awareness to these differences can significantly improve statistical machine translation. These results, however, required meta-information on the ontological status of texts (original or translated) which is typically ...
متن کاملTranslationese: Between Human and Machine Translation
Translated texts, in any language, have unique characteristics that set them apart from texts originally written in the same language. Translation Studies is a research field that focuses on investigating these characteristics. Until recently, research in machine translation (MT) has been entirely divorced from translation studies. The main goal of this tutorial is to introduce some of the find...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- LLC
دوره 21 شماره
صفحات -
تاریخ انتشار 2006